My Experience on aMeta

aMeta Workhsop

Emrah Kırdök, Ph.D.

7/4/24

Who am I?

  • Emrah Kırdök, Ph.D.
  • Trained as a biologist
  • Working on ancient metagenomics
  • Giving lectures on bioinformatics and data analysis

Outline

  • Ancient metagenomics workflows can be complex
  • Workflowing tools is essential
  • Database problem in ancient metagenomics
  • Benefits of using workflow managers

Who will benefit?

  • Early career researchers (MSc, PhD)
  • If you are new to bioinformatics

Bioinformatics methdodology

g input Input workflow Bioinformatics workflow input->workflow output Output workflow->output

Bioinformatics methdodology

Actually it is much more complicated…

g input1 Input1 tool1 Tool1 input1->tool1 input2 Input2 input2->tool1 parameters parameters parameters->tool1 output1 Output1 tool1->output1 output2 Output2 tool1->output2 tool2 Tool2 output2->tool2 output3 Output3 tool2->output3 moreparameters moreparameters moreparameters->tool2

Ancient metagenomics methodology

And in ancient metagenomics, it is much more complicated…

g fastq Fastq file classification Classification fastq->classification output Classification Output authentication Extract aDNA reads and authenticate output->authentication classification->output authentication_1 authentication_1 authentication->authentication_1 authentication_2 authentication_2 authentication->authentication_2 authentication_3 authentication_3 authentication->authentication_3 authentication_4 authentication_4 authentication->authentication_4 authentication_N authentication_N authentication->authentication_N database database database->classification database->authentication Final Report Final Report authentication_1->Final Report authentication_2->Final Report authentication_3->Final Report authentication_4->Final Report authentication_N->Final Report

Authentication part of the pipeline

It would be very hard to authenticate one by one

Ancient metagenomics methodology

It is even more complicated in a real situation:

The aMeta Workflow

Workflows

  • Workflow, is the materials and methods
  • You need to know all inputs and outputs
  • Bash scripts are quite easy to write
  • But, every time you run it starts from beginning
  • Job dependency?

My first workflow

aMeta workflow

  • aMeta allowed me to use snakemake in my research
  • A fully robust system that can automatically send slurm jobs
  • Beginning is hard, but you will benefit later
  • But, sometimes can be cryptic

aMeta

Krakenuniq classification rule as an example

Databases

  • Metagenomic database should be big
  • Alignment for aDNA authentication
  • Alignment based databases are quite big!
  • The two step proccess in aMeta
    • memory usage optimization

Good practices

  • Using snakemake, forces you to follow good practices
  • You will start doing reproducible bioinformatics

Good practices

Project name
├── LICENSE
├── README.md          <- The top-level README
├── data
│   ├── external       <- Data from third party sources.
│   ├── interim        <- Intermediate data that has been transformed.
│   ├── processed      <- The final, canonical data sets for modeling.
│   └── raw            <- The original read only
│
├── docs               <- All the document information should go here
│   ├── reports
│   └── presentations
│
├── workflow           <- Source code and snakemake rules 
│   ├── Snakefile      <- A snakefile, should include all sub rules
│   │
│   ├── rules          <- Seperate snakefile rules
│   │
│   ├── environments   <- Conda environments
│   │
│   └── singularity    <- Singularity containers
│ 
└── results

Good practices

  • Version controlling
    • git + github
    • also a backup
  • Conda environments
    • Automatically install dependencies

Using snakemake

  • Forces you to document every step
    • manual modifications